Page 1 of 1

2015 Conference article Open Access

Entity linking on philosophical documents
Trani S., Ceccarelli D., De Francesco A., Perego R., Segala M., Tonellotto N.
Entity Linking consists in automatically enriching a document by detecting the text fragments mentioning a given entity in an external knowledge base, e.g., Wikipedia. This problem is a hot research topic due to its impact in several text-understanding related tasks. However, its application to some specfiic, restricted topic domains has not received much attention. In this work we study how we can improve entity linking performance by exploiting a domain-oriented knowledge base, obtained by filtering out from Wikipedia the entities that are not relevant for the target domain. We focus on the philosophical domain, and we experiment a combination of three different entity filtering approaches: one based on the \Philosophy" category of Wikipedia, and two based on similarity metrics between philosophical documents and the textual description of the entities in the knowledge base, namely cosine similarity and Kullback-Leibler divergence. We apply traditional entity linking strategies to the domainoriented knowledge base obtained with these filtering techniques. Finally, we use the resulting enriched documents to conduct a preliminary user study with an expert in the area.Source: Italian Information Retrieval Workshop, pp. 12–12, Cagliari, Italy, 25-26/05/2015

See at: ceur-ws.org Open Access | CNR ExploRA

2015 Conference article Open Access

GERBIL: general entity annotator benchmarking framework
Usbeck R., Röder M., Ngomo A. -C. N., Baron C., Both A., Brümmer M., Ceccarelli D., Cornolti M., Cherix D., Eickmann B., Ferragina P., Lemke C., Moro A., Navigli R., Piccinno F., Rizzo G., Sack H., Speck R., Troncy R., Waitelonis J., Wesemann L.
We present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools on multiple datasets. By these means, we aim to ensure that both tool developers and end users can derive meaningful insights pertaining to the extension, integration and use of annotation applications. In particular, GERBIL provides comparable results to tool developers so as to allow them to easily discover the strengths and weaknesses of their implementations with respect to the state of the art. With the permanent experiment URIs provided by our framework, we ensure the reproducibility and archiving of evaluation results. Moreover, the framework generates data in machine-processable format, allowing for the efficient querying and post-processing of evaluation results. Finally, the tool diagnostics provided by GERBIL allows deriving insights pertaining to the areas in which tools should be further refined, thus allowing developers to create an informed agenda for extensions and end users to detect the right tools for their purposes. GERBIL aims to become a focal point for the state of the art, driving the research agenda of the community by presenting comparable objective evaluation results.Source: 24th International Conference on World Wide Web, pp. 1133–1143, Florence, Italy, 18-22/05/2015
DOI: 10.1145/2736277.2741626
Metrics:

See at: ISTI Repository Open Access | dl.acm.org Restricted | doi.org | CNR ExploRA

2015 Conference article Open Access

On the impact of Entity Linking in microblog real-time filtering
Berardi G., Ceccarelli D., Esuli A., Marcheggiani D.
Microblogging is a model of content sharing in which the temporal locality of posts with respect to important events, either of foreseeable or unforeseeable nature, makes applications of real-time filtering of great practical interest. We propose the use of Entity Linking (EL) in order to improve the retrieval effectiveness, by enriching the representation of microblog posts and filtering queries. EL is the process of recognizing in an unstructured text the mention of relevant entities described in a knowledge base. EL of short pieces of text is a difficult task, but it is also a scenario in which the information EL adds to the text can have a substantial impact on the retrieval process. We implement a start-of-the-art filtering method, based on the best systems from the TREC Microblog track real-time adhoc retrieval and filtering tasks , and extend it with a Wikipedia-based EL method. Results show that the use of EL significantly improves over non-EL based versions of the filtering methods. Copyright is held by the owner/author(s).Source: SAC'15 - 30th Annual ACM Symposium on Applied Computing, pp. 1066–1071, Salamanca, Spain, 13-17 April 2015
DOI: 10.1145/2695664.2695761
DOI: 10.48550/arxiv.1611.03350
Metrics:

2014 Conference article Restricted

Manual annotation of semi-structured documents for entity-linking
Ceccarelli D., Lucchese C., Orlando S., Perego R., Trani S.
The Entity Linking (EL) problem consists in automatically linking short fragments of text within a document to entities in a given Knowledge Base like Wikipedia. Due to its impact in several text-understanding related tasks, EL is an hot research topic. The correlated problem of devising the most relevant entities mentioned in the document, a.k.a. salient entities (SE), is also attracting increasing interest. Unfortunately, publicly available evaluation datasets that contain accurate and supervised knowledge about mentioned entities and their relevance ranking are currently very poor both in number and quality. This lack makes very difficult to compare different EL and SE solutions on a fair basis, as well as to devise innovative techniques that relies on these datasets to train machine learning models, in turn used to automatically link and rank entities. In this demo paper we propose a Web-deployed tool that allows to crowdsource the creation of these datasets, by sup- porting the collaborative human annotation of semi-structured documents. The tool, called Elianto, is actually an open source framework, which provides a user friendly and re- active Web interface to support both EL and SE labelling tasks, through a guided two-step process.Source: CIKM'14 - 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 2075–2077, Shanghai, China, 3-7 November 2014
DOI: 10.1145/2661829.2661854
Metrics:

See at: dl.acm.org Restricted | doi.org | CNR ExploRA

2014 Conference article Restricted

Bringing head closer to the tail with entity linking
Verma M., Ceccarelli D.
With the creation and rapid development of knowledge bases, it has become easier to understand the underlying semantics of unstructured text (short or long) on the web. In this work we especially look at the impact of entity linking on search logs. Search queries follow a Zipfian distribution wherein other than few popular queries (emph{head queries}), a significant percentage of queries (emph{tail queries}) occur rarely. Given a search log, there is sufficient data to analyze head queries but insufficient data (low frequency, limited clicks) to draw any conclusions about tail queries. In this work we focus on quantifying the extent of overlap between long tail and head queries by means of entity linking. We specifically analyze the frequency distribution of entities in head and tail queries. Our analysis shows that by means of entity linking, we can indeed bridge the gap between the head and tail.Source: ESAIR'14 - 7th International Workshop on Exploiting Semantic Annotations in Information Retrieval, pp. 37–39, Shanghai, China, 7 November 2014
DOI: 10.1145/2663712.2666196
Metrics:

See at: dl.acm.org Restricted | doi.org | CNR ExploRA

2013 Conference article Open Access

Dexter: an open source framework for entity linking
Ceccarelli D., Lucchese C., Orlando S., Perego R., Trani S.
We introduce Dexter, an open source framework for entity linking. The entity linking task aims at identifying all the small text fragments in a document referring to an entity contained in a given knowledge base, eg, Wikipedia. The annotation is usually organized in three tasks. Given an input document the first task consists in discovering the fragments that could refer to an entity. Since a mention could refer to multiple entities, it is necessary to perform a disambiguation step, where the correct entity is selected among the candidates.Source: ESAIR'13 - Sixth International Workshop on Exploiting Semantic Annotations in Information Retrieval, pp. 17–20, San Francisco, USA, 27 October - 1 November 2013
DOI: 10.1145/2513204.2513212
Metrics:

See at: doi.org Restricted | CNR ExploRA

2013 Conference article Open Access

Twitter anticipates bursts of requests for Wikipedia articles
Tolomei G., Orlando S., Ceccarelli D., Lucchese C.
Most of the tweets that users exchange on Twitter make implicit mentions of named-entities, which in turn can be mapped to corresponding Wikipedia articles using proper Entity Linking (EL) techniques. Some of those become trending entities on Twitter due to a long-lasting or a sudden effect on the volume of tweets where they are mentioned. We argue that the set of trending entities discovered from Twitter may help predict the volume of requests for relating Wikipedia articles. To validate this claim, we apply an EL technique to extract trending entities from a large dataset of public tweets. Then, we analyze the time series derived from the hourly trending score (i.e., an index of popularity) of each entity as measured by Twitter and Wikipedia, respectively. Our results reveals that Twitter actually leads Wikipedia by one or more hours.Source: DUBMOD '13 - 2013 Workshop on data-driven user behavioral modelling and mining from social media, pp. 5–8, San Francisco, USA, 27 October - 1 November 2013
DOI: 10.1145/2513577.2538768
Metrics:

See at: dl.acm.org Open Access | www.dsi.unive.it | doi.org Restricted | CNR ExploRA

2013 Conference article Open Access

When entities meet query recommender systems: semantic search shortcuts
Ceccarelli D., Gordea S., Lucchese C., Nardini F. M., Perego R.
The Web of Data is growing in popularity and dimension, and entities are gaining importance in many research fields. In this paper, we explore the use of entities that can be extracted from a query log to enhance query recommendation. In particular, we use a large query log recorded by the Europeana portal, a central access point to the descriptions of more than 20 million cultural heritage objects, and we extend a state-of-the-art query recommendation algorithm to take into account the semantic information associated with the submitted queries. Our novel method generates highly related and diversified suggestions. We assess it by means of a new evaluation technique. The manually annotated dataset used for performance comparisons has been made available to the research community to favor the repeatability of the experiments.Source: SAC '13 - 28th Annual ACM Symposium on Applied Computing, pp. 933–938, Coimbra, Portogallo, 18 - 22 marzo 2013
DOI: 10.1145/2480362.2480540
Metrics:

See at: dl.acm.org Open Access | doi.org Restricted | CNR ExploRA

2013 Conference article Open Access

Learning relatedness measures for entity linking
Ceccarelli D., Lucchese C., Orlando S., Perego R., Trani S.
Entity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowledge base. The most important of such features is entity relatedness. Indeed, we argue that these algorithms benefit from maximizing the relatedness among the relevant entities selected for annotation, since this minimizes errors in disambiguating entity-linking. The definition of an effective relatedness function is thus a crucial point in any entity-linking algorithm. In this paper we address the problem of learning high quality entity relatedness functions. First, we formalize the problem of learning entity relatedness as a learning-to-rank problem. We propose a methodology to create reference datasets on the basis of manually annotated data. Finally, we show that our machine-learned entity relatedness function performs better than other relatedness functions previously proposed, and, more importantly, improves the overall performance of different state-of-the-art entity-linking algorithms.Source: CIKM'2013 - 22nd ACM International Conference on Information & Knowledge Management, pp. 139–148, San Francisco, Usa, 27 October - 1 November 2013
DOI: 10.1145/2505515.2505711
Metrics:

See at: www.dsi.unive.it Open Access | dl.acm.org Restricted | doi.org | CNR ExploRA

2012 Conference article Restricted

You should read this! Let me explain you why! - Explaining news recommendations to users.
Blanco R., Ceccarelli D., Lucchese C., Perego R., Silvestri F.
Recommender systems have become ubiquitous in content- based web applications, from news to shopping sites. None- theless, an aspect that has been largely overlooked so far in the recommender system literature is that of automati- cally building explanations for a particular recommendation. This paper focuses on the news domain, and proposes to en- hance effectiveness of news recommender systems by adding, to each recommendation, an explanatory statement to help the user to better understand if, and why, the item can be her interest. We consider the news recommender system as a black-box, and generate different types of explanations em- ploying pieces of information associated with the news. In particular, we engineer text-based, entity-based, and usage- based explanations, and make use of a Markov Logic Net- works to rank the explanations on the basis of their effec- tiveness. The assessment of the model is conducted via a user study on a dataset of news read consecutively by actual users. Experiments show that news recommender systems can greatly benefit from our explanation module.Source: 21st ACM International conference on Information and knowledge management, pp. 1995–1999, Maui, Hawaii, 29 October - 2 November 2012
DOI: 10.1145/2396761.2398559
Metrics:

See at: dl.acm.org Restricted | doi.org | CNR ExploRA

2012 Conference article Restricted

Introducing RDF graph summary with application to assisted SPARQL formulation
Campinas S., Perry T., Ceccarelli D., Delbru R., Tummarello G.
One of the reasons for the slow adoption of SPARQL is the complexity in query formulation due to data diversity. The principal barrier a user faces when trying to formulate a query is that he generally has no information about the underlying structure and vocabulary of the data. In this paper, we address this problem at the maximum scale we can think of: providing assistance in formulating SPARQL queries over the entire Sindice data collection - 15 billion triples and counting coming from more than 300K datasets. We present a method to help users in formulating complex SPARQL queries across multiple heterogeneous data sources. Even if the structure and vocabulary of the data sources are unknown to the user, the user is able to quickly and easily formulate his queries. Our method is based on a summary of the data graph and assists the user during an interactive query formulation by recommending possible structural query elements.Source: 23rd International Workshop on Database and Expert Systems Applications. 11th International Workshop on Web Semantics and Information Processing, pp. 261–266, Wien, 3-7 September 2012
DOI: 10.1109/dexa.2012.38
Project(s): GEOKNOW via OpenAIRE

Metrics:

See at: doi.org Restricted | ieeexplore.ieee.org | CNR ExploRA

2011 Journal article Restricted

Discovering Europeana Users' Search Behavior
Ceccarelli D., Gordea S., Lucchese C., Nardini F. M., Perego R. Tolomei G.
Europeana is a strategic project funded by the European Commission with the goal of making Europe's cultural and scientific heritage accessible to the public. ASSETS is a two-year Best Practice Network co-funded by the CIP PSP Programme to improve performance, accessibility and usability of the Europeana search engine. Here we present a characterization of the Europeana logs by showing statistics on common behavioral patterns of the Europeana users.Source: ERCIM news 86 (2011): 39–40.

See at: ercim-news.ercim.eu Restricted | CNR ExploRA

2011 Conference article Open Access

Caching query-biased snippets for efficient retrieval
Lucchese C., Perego R., Ceccarelli D., Silvestri F., Orlando S.
Web Search Engines' result pages contain references to the top-k documents relevant for the query submitted by a user. Each document is represented by a title, a snippet and a URL. Snippets, i.e. short sentences showing the portions of the document being relevant to the query, help users to select the most interesting results. The snippet generation process is very expensive, since it may require to access a number of documents for each issued query. We assert that caching, a popular technique used to enhance performance at various levels of any computing systems, can be very e ective in this context. We design and experiment several cache organizations, and we introduce the concept of supersnippet, that is the set of sentences in a document that are more likely to answer future queries. We show that supersnippets can be built by exploiting query logs, and that in our experiments a supersnippet cache answers up to 62% of the requests, remarkably outperforming other caching approaches.Source: 14th International Conference on Extending Database Technology, EDBT/ICDT '11, pp. 93–104, Uppsala, Sweden, March 21-24 2011
DOI: 10.1145/1951365.1951379
Project(s): ASSETS
Metrics:

See at: www.dsi.unive.it Open Access | doi.org Restricted | portal.acm.org | CNR ExploRA

2011 Conference article Restricted

Improving Europeana search experience using query logs
Ceccarelli D., Gordea S., Lucchese C., Nardini F. M., Tolomei G.
Europeana is a long-term project funded by the European Commission with the goal of making Europe's cultural and scientific heritage accessible to the public. Since 2008, about 1500 institutions have contributed to Europeana, enabling people to explore the digital re- sources of Europe's museums, libraries and archives. The huge amount of collected multi-lingual multi-media data is made available today through the Europeana portal, a search engine allowing users to explore such con- tent through textual queries. One of the most important techniques for enhancing users search experience in large information spaces, is the exploitation of the knowledge contained in query logs. In this paper we present a characterization of the Europeana query log, showing statistics on common behavioral patterns of the Europeana users. Our analysis highlights some significative differences between the Europeana query log and the historical data collected by general purpose Web Search Engine logs. In particular, we find out that both query and search session distributions show different behaviors. Finally, we use this information for designing a query recommendation technique having the goal of enhancing the functionality of the Europeana portal.Source: Research and Advanced Technology for Digital Libraries. International Conference on Theory and Practice of Digital Libraries, pp. 384–395, Berlin, Germany, 26-27-28 SETTEMBRE 2011
DOI: 10.1007/978-3-642-24469-8_39
Project(s): ASSETS
Metrics:

See at: doi.org Restricted | gateway.webofknowledge.com | www.springerlink.com | CNR ExploRA

2011 Conference article Restricted

The Sindice-2011 dataset for entity-oriented search in the Web of data
Stéphane Campinas, Ceccarelli Diego, Perry Thomas, Delbru Renaud, Tummarello Giovanni, Balog Krisztian
The task of entity retrieval becomes increasingly prevalent as more and more (semi-) structured information about objects is available on the Web in the form of documents embedding metadata (RDF, RDFa, Microformats and others). However, research and development in that direction is dependent on (1) the availability of a representative corpus of entities that are found on the Web, and (2) the availability of an entity-oriented search infrastructure for experimenting new retrieval model. In this paper, we introduce the Sindice-2011 data collection which is derived from the data collected by the Sindice semantic search engine. The data collection is especially designed for supporting research in the domain of web entity retrieval. We describe how the corpus is organised, discuss statistics of the data collection, and introduce a search infrastructure to foster research and development.Source: 1st International Workshop on Entity-Oriented Search, pp. 26–32, Beijing, 28 July 2011

See at: research.microsoft.com Restricted | CNR ExploRA

2011 Report Open Access

ASSETS - The ASSETS API
Briguglio L. Gordea S., Lindley A., Tzoannos E., Meghini C., Cardillo F. A., Esuli A., Falchi F., Ceccarelli D., Bolettieri P., Aloia N., Concordia C., Valdes V., Lopez F., Martinez J. M., Bescos J., Castells P., Garcia M. A., Paytuvi O., Lazaridis M., Beloued A., Spyratos N., Sugibuchi T.
This is a technical document detailing the ASSETS architecture and API for each component. It integrates and extends results of the first year mainly from T2.0.4 "Platform design and implementation guidelines" and T2.0.5 "API Specifications", but it introduces technical aspects of all the software services defined, analysed, implemented and tested in ASSETS WP2.1, WP2.2, WP2.3, WP2.4, WP2.5 and WP3.2. This document provides the following information regarding the Assets Services: The rationale behind the service choices; The approach and methodology of proposed solutions; The services description and the definition of their interfaces (APIs); The data models and data flows exchanged between services. This documentation is the basis for the development activities, because identifies the components, their responsibilities data models and interfaces. Through the iteration, as yet occurred during the first year, the data models and interfaces will be refined and enriched with further details. More, even if an improved User Interface is expected on next months, yet in this document ASSETS is able to present some anticipations with preliminary results and mock-ups, carried out during this year. Those are shown for better describing and clarifying how the ASSETS services are/will be used for improving access and usability of Europeana.Source: Project report, ASSETS, Deliverable D2.0.4, 2011
Project(s): ASSETS

See at: ISTI Repository Open Access | CNR ExploRA

2011 Report Unknown

ASSETS - Interface specifications and system design
Briguglio L., Gordea S., Lindley A., Tzoannos E., Meghini Ca., Cardillo F. A., Esuli A., Falchi F., Ceccarelli D., Bolettieri P., Aloia N., Concordia C., Valdes V., Paytuvi O., Lazaridis M., Beloued A., Spyratos N., Sugibuchi T.
This internal document provides the following information regarding the Assets Services: . the service description; . the definition of the interfaces; . the data models and data flows exchanged between services. It integrates results from both T2.0.4 "Platform design and implementation guidelines" and T2.0.5 "API Specifications". This documentation is the basis for the development activities, because identifies the components, their responsibilities and, a preliminary definition of data models and interfaces. Through the iteration, the data model models and interfaces will be refined and enriched with further details. In this perspective, this documentation could be considered the most important contribution for the next deliverable D2.0.4 "The ASSETS APIs", expected for month 12.Source: Project report, ASSETS, Deliverable D2.0.2, 2011
Project(s): ASSETS

See at: CNR ExploRA